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Abstract — Regenerating codes are a class of codes for dis- 
tributed storage networks that provide reliability and availability 
of data, and also perform efficient node repair. Another important 
aspect of a distributed storage network is its security. In this 
paper, we consider a threat model where an eavesdropper may 
gain access to the data stored in a subset of the storage nodes, 
and possibly also, to the data downloaded during repair of some 
nodes. We provide explicit constructions of regenerating codes 
that achieve information-theoretic secrecy capacity in this setting. 



I. Introduction 

We consider a distributed storage system consisting of n 
storage nodes in a network, each having a capacity to store a 
symbols over a finite field ¥ q of size q. Data corresponding 
to B message symbols (the message), each drawn uniformly 
and independently from ¥ q , is to be dispersed across these n 
nodes. An end-user (called a data-collector) must be able to 
reconstruct the entire message by downloading the data stored 
in any subset of k nodes. If data-reconstruction was the only 
requirement, any [n, k] maximum-distance-separable (MDS) 
code such as a Reed-Solomon code would suffice. 

A second important aspect of a distributed storage system 
is the handling of node failures. When a storage node fails, it 
is replaced by a new, empty node. The replacement node is 
required to obtain the data that was previously stored in the 
failed node by downloading data from the remaining nodes 
in the network. A typical means of accomplishing this is to 
download the entire message from the network, and extract 
the desired data from it. However, downloading the entire 
message, when it eventually stores only a fraction i of it, 
is clearly wasteful of the network resources. 

Recently, Dimakis et al. fT| introduced a new class of 
codes called 'regenerating codes' which are efficient with 
respect to both storage space utilization and the amount of data 
downloaded for repair (termed repair-bandwidth). Regenerat- 
ing codes permit node repair by downloading f3 symbols from 
any subset of d (> k) remaining nodes, and the total repair- 
bandwidth d/3 is typically much smaller than the message size 
B. In [ 1 1 the authors also establish that the parameters involved 
must necessarily satisfy the bound: 



B < 



fc-i 

E 

i=0 



min (a, (d — i)j3) 
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It can be deduced (see |fl~)) that achieving equality in ([T), with 
parameters B, k and d fixed, leads to a tradeoff between the 
storage space a and the repair-bandwidth d/3. In this tradeoff, 
the case of minimizing a first and then /3 (for fixed d) is 
termed as the minimum storage regenerating (MSR) case, 
while carrying out the minimization in the reverse order is 
termed the minimum bandwidth regenerating (MBR) case. 
More details on the MSR and MBR cases are provided later 
in the paper. Explicit constructions of MSR and MBR codes 
achieving this bound can be found in 0, iPfl-llrjI. 

The focus of the present paper is on an additional, important 
aspect of distributed storage systems, namely, security of the 
data. Nowadays, individuals as well as businesses are increas- 
ingly storing their data over untrusted networks. Peer-to-peer 
storage systems have storage nodes spread out geographically. 
Such situations make the data prone to prying adversaries that 
may gain access to the data stored in some of the nodes. An 
eavesdropper can also gain additional information by listening 
to the data downloaded during multiple instances of repair 
of these nodes. It is imperative to prevent such entities from 
gaining any useful information. The present paper constructs 
explicit codes which, while satisfying the reconstruction and 
repair requirements in the distributed storage network, prevents 
such an eavesdropper from obtaining any information about 
the original message. 

The threat model considered in this paper is as follows. An 
eavesdropper can gain read-access to the data stored in any 
set of at-most £ (< k) storage nodes. The eavesdropper may 
also gain read-access to the data being downloaded during 
(possibly multiple instances of) repair of some £' (< £) of 
these £ nodes. Note that the data downloaded by a replacement 
node during any instance of repair also contains the data that 
is eventually stored in that node. This is formalized in the 
following definition. 

Definition 1 ({£, £'} secure distributed storage system): 
Consider a distributed storage system in which an 
eavesdropper gains access to the data stored in some 
(£ — £') nodes, and the data stored as well as the data 
downloaded during repair in some other £' nodes. An {£, £'} 
secure distributed storage system is one in which such an 
eavesdropper obtains no information about the message. 

We assume that the eavesdroppers have unbounded com- 
putational power, are passive, non-collusive, and that the 
underlying code is globally known. As an example of this 
model, consider a peer-to-peer storage system. The £' nodes 
described above may represent nodes that are in a network 
belonging to an adversary, thereby allowing the eavesdropper 
to listen to all the data downloaded as these £' nodes undergo 
(possibly multiple) failures and repairs across time. On the 
other hand, the (£ — £') nodes may represent the nodes which 



may be exposed only momentarily, allowing the eavesdropper 
access to only the data stored. 

The problem of providing information-theoretic secrecy in 
distributed storage systems can be related to the Wiretap 
Channel II [7] where an eavesdropper, listening to any ar- 
bitrary subset (of fixed size) of symbols being transmitted 
over a noiseless point-to-point channel, obtains essentially 
no information about the original message. While schemes 
providing secrecy in a distributed storage system with only 
the reconstruction requirement would follow from [7|, the 
requirement of addressing node-repair makes the problem 
harder. Among recent results in the context of distributed 
storage, the problem of securely disseminating encoded data to 
the storage nodes is considered in |8|, and an analysis of com- 
munication and interaction requirements between the nodes is 
provided. In J9), the authors consider the situation where data 
is stored over two networks, and an eavesdropper may gain 
access to any one of these networks. Connections between 
optimal repair in distributed storage and communication across 
multiple-access wiretap channels are established in |10|. 

The system model considered in the present paper is based 
on the model introduced by Pawar et al. (3). In ||3], the authors 
consider the case when £' = £ and provide an upper bound 
on the number of message symbols B^ that can be stored in 
the information-theoretically secure system as 



fe-i 



B {s) < Y^ min («, (d - i)/3) 



(2) 



The bound in (O can be interpreted in the following intuitive 
manner. Out of the k nodes to which a data-collector con- 
nects, consider the case where the first £ of these nodes are 
compromised. Thus, assuming the secrecy goals have been 
met, these £ nodes will provide zero information about the 
message symbols, and only the remaining (k — £) nodes in 
the summation in (Q} provide useful information. It can be 
shown that the bound in (0 is, in fact, an upper bound on the 
number of message symbols in an information-theoretically 
secure system for all values of £'. 

In the sequel, notation pertaining to the secure version of 
the code will frequently be indicated by the superscript (s). 
For instance, B^ denotes the number of message symbols in 
a system with secrecy constraints, and B denotes the number 
of message symbols in a system without secrecy constraints 
(i.e., when £ — £' — 0). Note that the difference B - B^ is 
the price paid for the additional secrecy constraint. 

In (3), the authors also show that the MBR code presented 
in (4) for the parameters [n, k,d = n — 1] can be made 
information-theoretically secure by making use of a nested 
MDS code in the construction. 

In the present paper, we provide explicit constructions for 
information-theoretically secure MBR and MSR codes for: 

1) MBR, all parameters [n, k, d], and 

2) MSR, all parameters [n, k, d > 2k - 2] . 

Each of the constructions presented is {£,£'} information- 
theoretically secure, for all values of £ and £', The secure MBR 
code presented is optimal for all {£, £'}, and the secure MSR 
code presented is optimal for all values of £ when £' = 0. Thus 



this also establishes the secrecy capacity of such a system for 
each of these parameter values. It is unknown at present as to 
whether or not the MSR code presented here is optimal for 
£' >1. 

The secure codes provided in the present paper are based 
on our previous work |2|, where we construct explicit regen- 
erating codes for the parameters listed above. The codes in [2] 
are based on a new Product-Matrix (PM) framework. We will 
call the MBR and MSR codes of [2| as the PM-MBR and 
PM-MSR codes respectively, and the corresponding secure 
versions constructed in the present paper as the secure PM- 
MSR and the secure PM-MBR codes respectively. 

While all other regenerating codes in the literature require 
the number of nodes n to be equal to d+1, the PM codes [2| 
do not pose any such constraint. Thus the PM codes are 
well suited for distributed storage systems where the number 
of nodes n may vary in time, or where the connectivity d 
required for repair may be low. These codes are also linear, 
i.e., each symbol in the system is a linear combination of 
the message symbols. As we shall subsequently see, the PM 
framework possesses two additional attributes that makes it 
more attractive for constructing secure codes: (a) exact-repair, 
and (b) data downloaded by a node for repair is independent 
of the set of d nodes to which it connects. A more detailed 
discussion is provided in Section [V] 

The rest of the paper is organized as follows. Section HI] 
presents the general approach followed in the paper for code 
construction and for proving information-theoretic secrecy. 
Section [III] presents the secure MBR code for all parameters 
[n, k, d] and {£,£'}. Section |IV] presents the secure MSR 
codes for all parameters [n, k, d > 2k — 2] and {£,£'}. The 
paper concludes with a discussion in Section [V] 

II. Approach 

We approach the problem of providing secrecy in the pres- 
ence of eavesdroppers, in the following manner. To construct a 
secure code for a given [n, k, d], we choose the corresponding 
PM code [2| with the same values of system parameters 
[n,k,d\. In the input to the PM code (without secrecy), we 
replace a specific, carefully chosen set of 

B (s) 



R = B 



(3) 



message symbols with R random symbols. Each of these 
random symbols are chosen uniformly and independently from 
¥ q , and are also independent of the message symbols. 

If the random symbols are treated as message symbols, the 
secure code becomes identical to the original code. Hence, the 
processes of reconstruction and repair in the secure code can 
be carried out in the same way as in the original code. 

To prove {£,£'} secrecy of our codes, we consider the 
worst case scenario where an eavesdropper has access to 
precisely {£,£'} nodes. Let U denote the collection of the 
B^ message symbols, and let 1Z denote the collection of 
R random symbols as defined in OJ. Further, let £ denote 
the collection of symbols that the eavesdropper gains access 
to. For each of the codes presented in this paper, the proof 
of information-theoretic secrecy proceeds in the following 
manner. All logarithms are taken to the base q. 



Step 1: We show that given all the message symbols U 
as side-information, the eavesdropper can recover all the R 
random symbols, i.e., H(1Z\£,U) = 0. 

Step 2: Next we show that all but R of the symbols obtained 
by the eavesdropper are functions of these R symbols, i.e., 
H{£) < R. 

Step 3: We finally show that the two conditions listed 
in steps 1 and 2 above necessarily implies that the mutual 
information between the message symbols U and the symbols 
£ obtained by the eavesdropper is zero, i.e., I(U] £) = 0. 

III. Secure MBR codes for All [n, k, d], {£, £'} 

MBR codes achieve the minimum possible repair- 
bandwidth: a replacement node downloads only what it stores, 
i.e., have d/3 = a. Substituting this in the bound in (Q]), 
and replacing the inequality with equality, we get that in the 
absence of secrecy requirements an MBR code must satisfy 

''' *"P,a = d0. (4) 
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In this section, we present explicit constructions of 
information-theoretically secure MBR codes for all parameter 
values [n, k, d] and all {£, £'}, These codes meet the upper 
bound (ffjl on the total number of message symbols, thus 
showing that (ffj) is indeed the secrecy capacity at the MBR 
point for all parameters. These codes are based on the PM- 
MBR codes constructed in J2)- We first provide a brief 
description of the PM-MBR codes, before moving on to the 
construction of the secure PM-MBR codes. 

We construct codes for the case j3 — 1, and codes for any 
higher value of j3 can be obtained by a simple concatenation 
of the j3 = 1 code. In the terminology of distributed storage, 
this process is known as striping. Thus an MBR code with 
(3 = 1 has a = d. 

A. Recap of the Product-Matrix MBR codes 

The PM-MBR code [2 1 can be described in terms of an (n x 
a) code matrix C, where the a elements in its i th row represent 
the a symbols stored in node i (1 < i < n). The code matrix 
C is a product of two matrices: a fixed (n x d) encoding matrix 
W and a (dxa) message matrix M comprising the B message 
symbols in a possibly redundant fashion, i.e., 



C = *M 



(5) 



Denoting the i th row of ^f as tp., the a symbols stored in 
the i th storage node is expressed as ijf.M. The superscript H' 
denotes the transpose of a matrix. 

In the PM-MBR code, the encoding matrix \& and the 
message matrix M are of the form 
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nx(d-k) 


dxd 
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kxk 
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kx(d-k) 





(d-fc)xfc (d-fc)x(d-fc) 

The matrices $ and A are chosen in such a way that (a) any 
k rows of $ are linearly independent, and (b) any d rows 
of \I> are linearly independent. These requirements can be 



met, for example, by choosing ^ to be either a Cauchy or 
a Vandermonde matrix. The choice of the matrix ^f governs 
the choice of the size q of the finite field ¥ q , e.g., choosing 
W as Vandermonde allows us to use any q > n. 

The matrices S and T in the message matrix M are 
populated by the B message symbols, 



B = kd 



k(d-k) 



k(k+l) 



(6) 



as follows. The 



fc(fc+i) 



symbols in the upper triangular half 



of the (k x k) symmetric matrix S and the k(d — k) elements 
in the (k x (d— k)) matrix T are set equal to the B message 
symbols. Note thatthe symmetry of matrix S makes M also 
symmetric. 

Example 1: We illustrate the code with an example; this 
example will also be used subsequently to illustrate the secure 
code. Let n — 6, k = 3, d = 4, Then with /3 = 1, we get 
a = d = 4 and B = 9, We design the code over the finite 
field F7. The (6 x 4) encoding matrix \P can be chosen as a 
Vandermonde matrix with its i th row as ib l . = [1 i i 2 i 3 }. 
The matrices S and T, and hence the message matrix M are 
populated by the 9 message symbols {uj}f =1 as 



S = 



We now describe the reconstruction and the repair processes 
in the PM-MBR code. 

1) Reconstruction: Let \1/ D c = [ ^dc A dc ] be the 
(k x d) submatrix of ty, corresponding to the k rows of \& to 
which the data-collector connects. Thus the data-collector has 
access to the symbols * DC M = [ $ DC 5 + A DC T* $ DC T ] . 
By construction, the matrix <1> DC is nonsingular. Hence, by 
multiplying the matrix ^^M on the left by Q^ 1 , one can 
recover first the matrix T and subsequently, the matrix S. 

2) Repair: Let ip be the row of ^ corresponding to the 
failed node /. Thus the d symbols stored in the failed node 
are ip .M. The replacement for the failed node / connects to 
an arbitrary set {hi\l <i<d]ofd remaining nodes. Each of 
these d nodes passes on the inner product [ip M)ip , to the 
replacement node. Thus from these d nodes, the replacement 
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By construction, the (dxd) matrix ^ rep is 
invertible. This allows the replacement node to recover Mip , 
Since M is symmetric, (Mip )* = i\> M which is precisely 
the data stored in the node prior to failure. 



B. Information-theoretic Secrecy in the PM-MBR Code 

For the MBR code, we have dj3 = a, i.e., a replacement 
node stores all the data that it downloads during its repair. 
Thus an eavesdropper does not obtain any extra information 
from the data that is downloaded for repair. Hence for an MBR 
code, we can assume without loss of generality that £' = 0. 

In this section, we will construct codes that achieve the 
upper bound in (O at the MBR point. Substituting a — dfi 



in d2]i and replacing the inequality with equality, we get that 
such a code must necessarily satisfy 



£ (s) = [kd- 



P 



(7) 



We now construct an {£, £'} secure MBR code satisfy- 
ing 0, based on the PM-MBR code. We denote the PM-MBR 
code [2] described above as C, and the secure PM-MBR code 
constructed here as C^ s \ As mentioned previously, we will 
present the construction for the case f3 = 1. 

Let ^W be the (nxd) encoding matrix of code C^ s >. Choose 
\l/( s ) to satisfy the following property in addition to those 
required by 4<: when restricted to the first £ columns, any £ 
rows are linearly independent. The choice of 4^ s ) as a Cauchy 
or Vandermonde matrix satisfies this additional property as 
well. We now modify the message matrix M of code C to 
obtain message matrix M W of code C^. Replace the 



R = B- B is) = Id- 



dS) 



message symbols in the first £ rows (and hence first £ columns) 
of the symmetric matrix M by R random symbols. Each 
random symbol is chosen independently and uniformly across 
the elements of ¥ q . Thus the (n x a) code matrix for the secure 
PM-MBR code C^ is given by C^ = ^WjlfW. 

Example 2: We will use the PM-MBR code in Example Q] 
to obtain a secure PM-MBR code for [n = 6, fc = 3, d = 4] 
with £ = I. From (Q with /3 = 1 we get B" = 5. Thus we 
have R = B — B^ = 4. We replace the four message symbols 
U\, U2, U3 and U7 in Example Q] with random symbols r\, r%, 
r3 and r-j drawn uniformly and independently from F7 to get 
the new message matrix M^ as: 



M {s) = 
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(9) 



Since the matrix 4< in Example Q] is a Vandermonde matrix 
which already satisfies the additional property, we retain it in 
the new code, i.e., * (s) = 4*. Thus the secure PM-MBR code 
for the desired parameters is given by C^ s ' — ^( S >M^ S >. 

The following theorems prove the properties of reconstruc- 
tion, repair and secrecy in the secure PM-MBR code. 

Theorem 1 (Reconstruction and Repair): In code C^ s > pre- 
sented above, a data-collector can recover all the B^ message 
symbols by downloading data stored in any k nodes, and a 
failed node can be repaired by downloading one symbol each 
from any d remaining nodes. 

Proof: Treating the random symbols also as message 
symbols, the secure PM-MBR code C^ becomes identical 
to the PM-MBR code C. Thus reconstruction and repair in 
C (s) are identical to that in C. ■ 

Theorem 2 (Information-theoretic Secrecy): In code C^ s > 
designed for a given value of £, an eavesdropper having access 
to at most I nodes gets no information pertaining to the 
message. 

Proof: Let 4'eve be the (£ x d) submatrix of 4 v(s) , 
corresponding to the £ rows of 4< to which the eavesdropper 



has gained access. Thus the eavesdropper has access to the id 
symbols in the (I x d) matrix E^ defined as 



E 



eve 



(10) 



Following the approach described in Section |IlJ we first 
show that given the message symbols as side information, an 
eavesdropper can decode all the random symbols. To this end, 
define M^ s > as a (dxd) matrix obtained by setting all message 
symbols in M^ to zero. Thus M^ has its first I rows and 
first £ columns identical to that of M^ s \ and zeros elsewhere. 
Let 

E {s) =$WP , (11) 

which are the Id symbols that the eavesdropper has access 
to, given the message symbols as side information. Recall the 
property of 4'eve wherein any £ rows, when restricted to the 
first I columns, are independent. Thus, recovering the R ran- 
dom symbols from E is identical to data reconstruction in the 
original PM-MBR code C designed for [h — n, k — £, d — d], 
t = 0. Thus, given the message symbols, the eavesdropper can 
decode all the random symbols. 

The next step is to show that H(£) < R. From the value of 
R in ([8]), it suffices to show that out of the £d symbols that the 
eavesdropper has access to, (i) of them are functions (linear 
combinations) of the rest. Consider, the (£ x £) matrix 



K( s )f\I/( s h* = \T/( s )/lf( s )('vI/( s ))* 

J - J V^eve/ ^eve J,J V ^ eve 7 



(12) 



Since M ^ is symmetric, the (l x £) matrix in (fT2l is also 
symmetric. Thus („) dependencies among the elements of E^ 
can be described by the („) upper-triangular elements of the 
expression 



25«(ttW)t_ttW(25(.))* = 



(13) 



Using the linear-independence property of the rows of 4 , ^ s \ it 
can be shown that these (f) redundant equations are linearly 
independent. Thus the eavesdropper has access to at-most Id— 
(„) independent symbols, i.e., H{£) < R. 

We have shown that in the secure PM-MBR code, steps 1 
and 2 of the approach described in Section HI] hold true. The 
final part of the proof, Step 3, establishes that the eavesdropper 
obtains no information about the message. 

I(U;£) = 
< 
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where ( TT5b follows from the result of Step 2; ( fT6b follows 
since every symbol in the system is a function of U and 1Z, 
giving H{£\U, TV) = 0; (dUl follows from the result of Step 1; 
and (fT~9b follows since the random symbols are independent 
of the message symbols. ■ 



IV. Secure MSR codes for all [n, k, d> 2k-2], {£,£'} 

MSR codes achieve the minimum possible storage at each 
node. Since a data-collector connecting to any k nodes should 
be able to recover all the B message symbols, each node 
must necessarily store at-least a fraction -r of the entire data. 
Hence for an MSR code we have a = j-. It follows from (Q3 
(replacing the inequality with equality) that in the absence of 
secrecy requirements an MSR code must satisfy 



B = ka, d/3 = a + (k- l)/3 . 



(21) 



From (|2TT > we see that, in general, for an MSR code d(3 > a. 
Thus the amount of data downloaded during repair is greater 
than what is eventually stored. This requires us to distinguish 
between the situations when the eavesdropper has access to 
only the data stored in a node, and when it has access to the 
data downloaded during repair. Note that the data downloaded 
by a replacement node during repair also contains the data that 
is eventually stored in it. 

In this section we present explicit constructions of 
information-theoretically secure MSR codes for all parameter 
values [n, k, d > Ik - 2] and all {£,£'}. The secure MSR 
codes are based on the PM-MSR codes presented in J2j. 

A. Recap of the Product-Matrix MSR codes 

We first provide a brief description of the PM-MSR 
code [2|. The code is designed for the case d = 2k — 2, and 
can be extended to d > 2k — 2 via shortening (see (2), (3) 
for a detailed description of shortening in MSR codes). As in 
the MBR case, we construct codes for the case when (3 = 1. 
Setting d = Ik - 2 and /3 = 1 in (EB gives 



B = a(a + 1), a = k — 1, d = 2a 



(22) 



The PM-MSR code C in can be described in terms 
of an (n x a) code matrix C = *$>M, with the i th row of 
C containing the a symbols stored in node i. The (n x d) 
encoding matrix "J is of the form \P = [$ A$], where $ 
is an (n x a) matrix and A is an (n x n) diagonal matrix 
satisfying: (a) any a rows of $ are linearly independent, (b) 
any d rows of W are linearly independent, and (c) the diagonal 
elements of A are all distinct. The ((d = 2a) x a) message 
matrix M is of the form M = \S\ Sy*, where S\ and S<2 
are (a x a) symmetric matrices. The two matrices5i and S2 
together contain a(a+l) distinct symbols, and these positions 
are populated by the B = a(a + 1) message symbols. This 
completes the description of the code construction. 

A description of the reconstruction and repair operations 
under this code can be found in [2|. The repair algorithm 
in (2j is such that the data downloaded by any node for repair 
is independent of the set of d nodes to which it connects. 
This property is highly advantageous while constructing secure 
codes, as discussed in Section [VI 

B. Information-theoretic Secrecy in the PM-MSR Code 
For the MSR case, from (O we get 

B {s) <{k- £)a . (23) 



On the other hand, the {£, £'} secure MSR codes constructed 
in the present paper (for d > 2k — 2) achieve 

B W 



{k - £){a - £' (3) 



(24) 



Thus our codes are optimal for £' = 0. As mentioned 
previously, it is unknown at present as to whether or not, our 
codes are optimal when £' > 1. 

The expression for B^ in (124-b can be interpreted as 
follows. Consider a data-collector attempting to reconstruct 
the message from the data stored in some k nodes, and an 
eavesdropper having access to some £ of these k nodes. These 
£ nodes will not provide any useful information, thus resulting 
in the first term (k — £) in the product. Furthermore, the 
eavesdropper may have access to the data passed for repair 
of some £' of the £ nodes, and hence to the £'/3 (potentially 
distinct) symbols passed by each of the remaining (k — £) 
nodes during repair. These symbols should not reveal any 
information, and hence the second term (a — £'(3). 

We now describe the construction of the secure PM-MSR 
code (for (3 = 1). We retain the notation used in Section lTH-BI 
Choose ^^ such that it satisfies the following property in 
addition to those required for 'J: when restricted to the first £ 
columns, any £ rows of \E , ^ s - ) are linearly independent. Next, 
define a collection 1Z of 



R = B- B [s) =£a + (k- 



(25) 



random symbols picked independently with a uniform distri- 
bution over the elements of ¥ q , where ( |25T > follows from (fJTJ 
and d24"l >. Use these R random symbols to replace the following 
R symbols in the message matrix M of code C, to obtain 
matrix Af''': the £a — ( 2 ) symbols in the first £ rows (and 
hence the first £ columns) of the symmetric matrix S\, the 
(2) symbols in the intersection of the first (£ — 1) rows and 
first (£ — 1) columns of the symmetric matrix 62, and the 
(k — £)£' remaining symbols in the first £' rows (and hence 
the first £' columns) of S2. The secure PM-MSR code is given 
by C( s) = *( s )Af( s ). 

The following theorems prove the properties of reconstruc- 
tion, repair and secrecy in the secure PM-MSR code. 

Theorem 3 (Reconstruction and Repair): In code C^ s ' pre- 
sented above, a data-collector can recover all the B^ message 
symbols by downloading data stored in any k nodes, and a 
failed node can be repaired by downloading one symbol each 
from any d remaining nodes. 

Proof: As in the proof of Theorem Q] treating the ran- 
dom symbols also as message symbols, the secure PM-MSR 
code C (s) becomes identical to the PM-MSR code C. Thus 
reconstruction and repair in CW are identical to that in C. ■ 

Theorem 4 (Information-theoretic Secrecy): In code C^ s ' 
designed for a given value of £, an eavesdropper having access 
to at most £ nodes gets no information pertaining to the 
message. 

Proof (Sketch): Let *^e be the (£ x d) submatrix 
of ^( s \ corresponding to the £ rows of \P to which the 

(s) 

eavesdropper has gained access. Further, let 9?g V gj be the 
(£' x a) submatrix of & s \ corresponding to the £' nodes in 
which the eavesdropper has access to the repair downloads as 



well. Note that by definition of an {£, £'} secure system, these 
I' nodes are a subset of the set of £ nodes that constitute the 

(s) 

matrix Weve . From the repair algorithm of the PM-MSR code 
of HI, it turns out that the symbols £ that the eavesdropper 
gains access to comprises the elements of the (£ x a) matrix 



fSM and the elements of the (d x £') matrix M($^,)'. 

Following the approach described in Section [II] and in a 
manner analogous to the proof of Theorem [2] it can first be 
shown that given the message symbols as side information, an 
eavesdropper can decode all the random symbols. Next, using 
the properties of the matrix V?^ and the specific structure of 
the message matrix M i - S \ it can also be shown that H{£ ) < 
R. Finally, the arguments in < fT~4T > to (l20t established that the 
eavesdropper obtains no information about the message. ■ 

The extension to the case d > 2k — 2 can be achieved 
via shortening (Q, 0), using which one can use any linear 
secure MSR code with parameters [n + 1, fc + 1, d + 1, £ + 
1, £'} to construct a linear secure MSR code for parameters 
[n, fc, d, £, £'}. 
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V. Discussion 

The Product-Matrix framework [2| possesses two particular 
attributes that make the codes built in this framework attractive 
from the security perspective. First, many codes in the litera- 
ture including those in [1] consider functional repair, wherein 
the data stored in the replacement node is permitted to be 
different from that of the failed node as long as it satisfies the 
reconstruction and functional-repair properties of the system. 
This allows an eavesdropper to gain a greater amount of 
information by reading the data stored in a node across 
multiple instances of repair. On the other hand, PM codes offer 
exact-repair, wherein the data stored in the replacement node 
is identical to that in the failed node. Second, even if repair is 
exact, the data downloaded during repair of a particular node 
may depend on the set of d nodes helping in the repair process, 
and hence may be different during different instances of repair 
of that node. The PM framework, by design, ensures that 
the information contained in the symbols downloaded by the 
replacement node is independent of the identities of the helper 
nodes. This restricts information exposed to an eavesdropper 
that has access to the data downloaded during repair. 
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